Integrating Word Embedding Offsets into the Espresso System for Part-Whole Relation Extraction
نویسندگان
چکیده
Part-whole relation, ormeronymy plays an important role in many domains. Among approaches to addressing the part-whole relation extraction task, the Espresso bootstrapping algorithm has proved to be effective by significantly improving recall while keeping high precision. In this paper, we first investigate the effect of using fine-grained subtypes and careful seed selection step on the performance of extracting part-whole relation. Our multitask learning and careful seed selection were major factors for achieving higher precision. Then, we improve the Espresso bootstrapping algorithm for part-whole relation extraction task by integrating word embedding approach into its iterations. The key idea of our approach is utilizing an additional ranker component, namely Similarity Ranker in the Instances Extraction phase of the Espresso system. This ranker component uses embedding offset information between instance pairs of part-whole relation. The experiments show that our proposed system achieved a precision of 84.9% for harvesting instances of the partwhole relation, and outperformed the original Espresso system.
منابع مشابه
Perform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملApplying the Espresso-algorithm to large parsed corpora
Information extraction systems learn patterns for extracting pairs instantiating a given relation from text. For instance, for the relation capital of a system might learn extraction patterns such as ‘Arg1 is capital of Arg2’, or ’The embassador of arg2 was called back to Arg1’. Lightly supervised information extraction systems learn extraction patterns by means of a bootstrapping procedure, wh...
متن کاملEmploying Word Representations and Regularization for Domain Adaptation of Relation Extraction
Relation extraction suffers from a performance loss when a model is applied to out-of-domain data. This has fostered the development of domain adaptation techniques for relation extraction. This paper evaluates word embeddings and clustering on adapting feature-based relation extraction systems. We systematically explore various ways to apply word embeddings and show the best adaptation improve...
متن کاملNeural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten
Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...
متن کاملIntegrating Verbal Idioms into an NLP System
This paper describes the integration of verbal idioms into an Natural Language Processing (NLP) system, adopting a construction approach, which is based on the prior parsing stage, so that these MultiWord Expressions (MWE) can be taken into account in subsequent tasks, such as semantic role labeling or whole-part relation extraction. The paper focuses on body-part nouns, which are often part of...
متن کامل